Wave 2: PanelProfile outcome/dose shape extensions + autonomous-guide worked examples by igerber · Pull Request #366 · igerber/diff-diff

igerber · 2026-04-24T23:27:10Z

Summary

Extend PanelProfile with outcome_shape: Optional[OutcomeShape] (numeric outcomes) and treatment_dose: Optional[TreatmentDoseShape] (continuous treatments). New fields populate distributional facts that gate WooldridgeDiD (QMLE) and ContinuousDiD prerequisites pre-fit.
Add three end-to-end worked examples to llms-autonomous.txt (binary-staggered with never-treated, continuous-dose with zero baseline, count-shaped outcome). New §5 between the existing §4 reasoning and §5 post-fit validation; existing §5–§8 renumbered to §6–§9.
Top-level exports: OutcomeShape, TreatmentDoseShape from diff_diff. Both new dataclasses are frozen and JSON-serializable via PanelProfile.to_dict().

Methodology references (required if estimator / math changes)

Method name(s): profile_panel is descriptive (not an estimator); the worked examples reference CallawaySantAnna, WooldridgeDiD, ContinuousDiD. No estimator math is changed in this PR.
Paper / source link(s): no new methodology claims beyond what is already in docs/methodology/REGISTRY.md.
Any intentional deviations from the source: None. The is_time_invariant field uses exact distinct-count on observed non-zero doses (matching the documented contract and ContinuousDiD.fit()'s exact equality check). The is_count_like and is_integer_valued fields are explicitly labeled heuristics in the docstring + guide.

Validation

Tests added/updated: 17 new unit tests in tests/test_profile_panel.py covering each shape heuristic (count-like Poisson, binary-as-not-count-like, continuous normal, bounded unit, categorical returning None, skewness/kurtosis gating, JSON roundtrip, time-invariant dose, time-varying dose, no-zero-dose, binary/categorical treatment returning None, frozen invariants on both new dataclasses, sub-1e-8 dose-precision regression). 2 new content-stability tests in tests/test_guides.py guard the §5 worked-examples block and the new field references.
Backtest / simulation / notebook evidence: N/A (descriptive utility + documentation).

Security / privacy

Confirm no secrets/PII in this PR: Yes

github-actions · 2026-04-24T23:32:59Z

Overall assessment

⚠️ Needs changes — one unmitigated P1 methodology/contract mismatch in the new ContinuousDiD pre-fit gating surface.

Executive summary

P1: The new TreatmentDoseShape fields are documented as the ContinuousDiD prerequisites, but they do not match what ContinuousDiD.fit() actually enforces. That can mark ineligible panels as eligible pre-fit.
The mismatch is internal as well as methodological: the existing autonomous guide still says ContinuousDiD requires has_never_treated and treatment_varies_within_unit == False, while the new section shifts the gate to row-level has_zero_dose and nonzero-only is_time_invariant.
P2: The new worked example uses WooldridgeDiD(family="poisson"), but the public API is WooldridgeDiD(method="poisson"). The new guide test hard-codes the wrong call.
The new tests do not cover the two cases that would have exposed the P1: 0,0,d,d within-unit dose paths, and row-level zeros without any never-treated units.
I did not find a mitigating entry for the P1 in TODO.md. This was a static source review; I could not execute the code in this environment.

Methodology

Severity: P1
Impact: ContinuousDiD is still implemented as requiring unit-level never-treated controls (first_treat == 0) and full per-unit dose constancy, but the new profile surface defines has_zero_dose as “any zero row” and is_time_invariant on nonzero doses only, then states those new fields are the prerequisites. That is an undocumented mismatch between the shipped pre-fit guidance and the estimator contract. A panel with pre-period zeros and no never-treated units, or a 0,0,d,d dose path, can satisfy the new guide’s gate while ContinuousDiD.fit() still raises.
Concrete fix: Either align _compute_treatment_dose() to the actual estimator contract, or keep the fields descriptive and stop claiming they alone gate ContinuousDiD. In either case, keep has_never_treated and full-path time-invariance as the authoritative prerequisites unless you also change ContinuousDiD.fit() and document that deviation in REGISTRY.md.
Locations: diff_diff/profile.py:L656-L710, diff_diff/continuous_did.py:L222-L227, diff_diff/continuous_did.py:L348-L360, diff_diff/guides/llms-autonomous.txt:L115-L140, diff_diff/guides/llms-autonomous.txt:L198-L211, diff_diff/guides/llms-autonomous.txt:L690-L723, docs/methodology/REGISTRY.md:L719-L745

Code Quality

No findings in the changed Python implementation beyond the methodology contract issue above.

Performance

No findings.

Maintainability

No separate findings.

Tech Debt

No separate findings.
The P1 above is not mitigated by TODO.md; I did not find an existing tracked entry for it.

Security

No findings.

Documentation/Tests

Severity: P2
Impact: The new autonomous guide tells users/agents to call WooldridgeDiD(family="poisson"), but the estimator constructor accepts method, not family. Anyone following the example will get a runtime error, and the new guide test now locks that bad example in place.
Concrete fix: Change the guide and the stability test to WooldridgeDiD(method="poisson").
Locations: diff_diff/guides/llms-autonomous.txt:L767-L771, diff_diff/wooldridge.py:L273-L275, tests/test_guides.py:L62-L64
Severity: P2
Impact: The new tests only cover “constant dose in every row” and “changing nonzero dose,” so they miss the exact contradictory cases introduced by the new guidance: 0,0,d,d within-unit paths and zero rows without any never-treated units. That leaves the P1 above unguarded.
Concrete fix: Add regression tests asserting consistency between profile_panel guidance and ContinuousDiD.fit() for:
1. balanced panel with 0,0,d,d dose paths;
2. panel with row-level zeros but no never-treated units.
  Locations: tests/test_profile_panel.py:L1028-L1105

Path to Approval

Make the new TreatmentDoseShape contract consistent with ContinuousDiD.fit() and the Methodology Registry, or explicitly scope it as descriptive-only and restore the existing prerequisites (has_never_treated, full-path time invariance) in the guide.
Add regression tests for the two contradictory continuous-dose scenarios: 0,0,d,d paths and row-level zeros without never-treated controls.
Fix the autonomous guide and its stability test to use WooldridgeDiD(method="poisson").

…ptive-only; fix WooldridgeDiD method kwarg P1 (TreatmentDoseShape vs ContinuousDiD contract): - Reviewer correctly flagged that the new `is_time_invariant` field (per-unit non-zero distinct-count) does NOT match the actual `ContinuousDiD.fit()` gate at `continuous_did.py:222-228`, which uses `df.groupby(unit)[dose].nunique() > 1` over the FULL dose column (including pre-treatment zeros). My nonzero-only check silently classified `0,0,d,d` paths as time-invariant while ContinuousDiD would reject them. - Removed `is_time_invariant` field from `TreatmentDoseShape` entirely. The pre-existing `PanelProfile.treatment_varies_within_unit` field already encodes the correct ContinuousDiD prerequisite (matches the estimator's nunique check at line 224) and is correctly documented in §2 of the autonomous guide. Adding a second, narrower, mismatched gate was confusing - the reviewer's "scope as descriptive-only" path is the cleaner fix. - Reframed `TreatmentDoseShape` docstring + autonomous guide §2 field reference: explicitly NOT a ContinuousDiD prerequisite. `n_distinct_doses`, `has_zero_dose`, `dose_min/max/mean` provide descriptive distributional context; `has_never_treated` (unit-level) + `treatment_varies_within_unit == False` (full-path constancy) + `is_balanced` are the authoritative gates. - Rewrote §5.2 worked example reasoning chain to use the existing correct gates and added a counter-example showing `has_zero_dose=True` does NOT imply `has_never_treated=True` (the row-level vs unit-level distinction). - Added `test_treatment_dose_does_not_gate_continuous_did` covering the two contradictory cases the reviewer named: (1) `0,0,d,d` within-unit dose path, asserting `treatment_varies_within_unit=True` (the actual ContinuousDiD gate fires correctly); (2) row-level zeros without never-treated units, asserting `has_zero_dose=True` BUT `has_never_treated=False` (the two facts are distinct). - Removed `test_treatment_dose_continuous_time_varying_within_unit` and `test_treatment_dose_distinguishes_doses_at_high_precision` - both tested the dropped `is_time_invariant` field. P2 (WooldridgeDiD constructor kwarg): - The autonomous guide §5.3 worked example used `WooldridgeDiD(family="poisson")` but the actual constructor at `wooldridge.py:264` takes `method=`. Following the example would raise `TypeError: __init__() got an unexpected keyword argument 'family'`. Fixed in two places (the prose and the code snippet) and added a negative assertion in `test_guides.py` to prevent regression: `assert 'WooldridgeDiD(family="poisson")' not in text`. CHANGELOG updated to reflect the revised TreatmentDoseShape scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T23:48:34Z

/ai-review

github-actions · 2026-04-24T23:53:11Z

🔁 AI review rerun (requested by @igerber)

Head SHA: cc91a7dad85dbb36ec713b2deedeae7d0938e3bc

Overall Assessment
✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

The prior re-review P1 is resolved: the new TreatmentDoseShape surface is now explicitly descriptive-only, and the guide/tests defer to the same ContinuousDiD gates the estimator and Methodology Registry enforce: has_never_treated, treatment_varies_within_unit, and is_balanced. diff_diff/profile.py:L64-L83, diff_diff/profile.py:L658-L697, diff_diff/continuous_did.py:L222-L360, docs/methodology/REGISTRY.md:L717-L745, diff_diff/guides/llms-autonomous.txt:L198-L220, diff_diff/guides/llms-autonomous.txt:L499-L507, diff_diff/guides/llms-autonomous.txt:L732-L757, tests/test_profile_panel.py:L1057-L1115
The prior re-review P2 is resolved: the worked example now uses WooldridgeDiD(method="poisson"), matching the public API and the Wooldridge documentation. diff_diff/guides/llms-autonomous.txt:L794-L805, diff_diff/wooldridge.py:L271-L276, docs/methodology/REGISTRY.md:L1333-L1349, tests/test_guides.py:L50-L71
I did not find estimator-math, weighting, variance/SE, identification-assumption, or default-behavior changes in the PR; the changed surface is descriptive profiling, exports, guide content, and tests. diff_diff/profile.py:L43-L200, diff_diff/__init__.py:L253-L259, diff_diff/guides/llms-autonomous.txt:L637-L811
P3: ROADMAP.md still overstates the new fields as if they themselves gate Wooldridge/ContinuousDiD routing, which conflicts with the corrected guide/docstrings. ROADMAP.md:L141-L141
P3: the new top-level exports lack a matching regression in the existing top-level import-surface test. diff_diff/__init__.py:L253-L259, diff_diff/__init__.py:L503-L508, tests/test_profile_panel.py:L404-L413
Static source review only: I could not execute tests here because this environment is missing pytest and even numpy.

Methodology

No findings. The previous ContinuousDiD contract mismatch is fixed and now matches both the estimator implementation and REGISTRY.md. diff_diff/profile.py:L64-L83, diff_diff/continuous_did.py:L222-L360, docs/methodology/REGISTRY.md:L717-L745
No findings. The previous Wooldridge guide/API mismatch is fixed. diff_diff/guides/llms-autonomous.txt:L794-L805, diff_diff/wooldridge.py:L271-L276

Code Quality

No findings.

Performance

No findings.

Maintainability

P3 — ROADMAP.md still says the new extensions expose facts that “gate WooldridgeDiD QMLE / ContinuousDiD prerequisites pre-fit,” which reintroduces the exact conceptual confusion the guide/docstrings just corrected; reword it to say the new fields add descriptive outcome/dose-shape context, while the authoritative ContinuousDiD gates remain has_never_treated, treatment_varies_within_unit, and is_balanced, and outcome_shape.is_count_like only informs Poisson-QMLE consideration. ROADMAP.md:L141-L141, compare diff_diff/profile.py:L64-L83, diff_diff/guides/llms-autonomous.txt:L198-L220, diff_diff/guides/llms-autonomous.txt:L499-L507

Tech Debt

No findings. I do not see a new correctness issue that should be deferred into TODO.md.

Security

No findings.

Documentation/Tests

P3 — The PR adds OutcomeShape and TreatmentDoseShape to the public top-level namespace, but test_top_level_import_surface() still only covers profile_panel, PanelProfile, and Alert; extend that test to assert diff_diff.OutcomeShape, diff_diff.TreatmentDoseShape, and their presence in diff_diff.__all__. diff_diff/__init__.py:L253-L259, diff_diff/__init__.py:L503-L508, tests/test_profile_panel.py:L404-L413
Review note: I could not run the added tests in this environment because pytest is unavailable and importing the package runtime also fails for lack of numpy.

…corrected scope; cover new exports in import-surface test P3 #1 (ROADMAP wording drift): ROADMAP.md still said the new fields "gate WooldridgeDiD QMLE / ContinuousDiD prerequisites pre-fit" and mentioned "time-invariance", which contradicted the round-1 corrections to TreatmentDoseShape's docstring + autonomous guide §2 + §5.2. Reworded to match: the new fields add descriptive distributional context only; `outcome_shape.is_count_like` informs (not gates) the WooldridgeDiD QMLE judgment, and the authoritative ContinuousDiD pre-fit gates remain `has_never_treated`, `treatment_varies_within_unit`, and `is_balanced`. "Time-invariance" wording removed (the field was dropped in round 1). P3 #2 (import-surface test coverage): `test_top_level_import_surface()` previously only verified `profile_panel`, `PanelProfile`, `Alert`. Extended to also cover the two new public exports `OutcomeShape` and `TreatmentDoseShape`, asserting both their importability and their presence in `diff_diff.__all__`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T23:56:26Z

/ai-review

github-actions · 2026-04-25T00:01:51Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 370915efd4056efc0cfe882a96041818a5d62239

Overall assessment

✅ Looks good — no unmitigated P0/P1 findings. The prior substantive re-review issues I checked are resolved.

Executive summary

The earlier ContinuousDiD contract concern is fixed: the new treatment_dose surface is now explicitly descriptive-only, and the guide points back to the same field-based gates enforced in ContinuousDiD.fit() and documented in the registry. diff_diff/profile.py:L64-L89, diff_diff/guides/llms-autonomous.txt:L198-L220, diff_diff/guides/llms-autonomous.txt:L499-L507, diff_diff/continuous_did.py:L222-L360, docs/methodology/REGISTRY.md:L717-L745
The earlier Wooldridge worked-example/API mismatch is fixed: the guide now uses WooldridgeDiD(method="poisson"), matching the public API and the Wooldridge registry entry. diff_diff/guides/llms-autonomous.txt:L794-L805, diff_diff/wooldridge.py:L271-L275, docs/methodology/REGISTRY.md:L1333-L1349
The earlier top-level export/test gap is fixed: OutcomeShape and TreatmentDoseShape are exported from diff_diff and covered by the import-surface regression. diff_diff/__init__.py:L253-L259, diff_diff/__init__.py:L503-L510, tests/test_profile_panel.py:L404-L422
P3: the new “authoritative ContinuousDiD pre-fit gates” wording now sounds exhaustive but omits the separate duplicate-cell hard stop already documented elsewhere; _precompute_structures() still resolves duplicate (unit, time) rows by last-row-wins overwrite. diff_diff/guides/llms-autonomous.txt:L330-L343, diff_diff/guides/llms-autonomous.txt:L499-L507, diff_diff/continuous_did.py:L818-L823, diff_diff/profile.py:L69-L75, CHANGELOG.md:L11-L11, ROADMAP.md:L141-L141
Static review only: I could not run tests here because this environment is missing pytest, numpy, and pandas.

Methodology

No P0/P1 findings. The changed guide/docstring text now matches the current ContinuousDiD and WooldridgeDiD contracts and does not introduce an undocumented estimator-methodology deviation.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings beyond the documentation item below.

Tech Debt

No findings. I did not see a related mitigation entry needed in TODO.md.

Security

No findings.

Documentation/Tests

P3 — Impact: the new ContinuousDiD prerequisite summary omits the already-documented duplicate-row hard gate, so a reader treating the new treatment_dose summary as exhaustive can still route duplicate-containing panels into a code path that silently overwrites earlier (unit, time) rows. Concrete fix: in the new TreatmentDoseShape docs and the related guide/ROADMAP/CHANGELOG wording, change “authoritative gates are ...” to “core field-based gates are ... plus no duplicate_unit_time_rows alert,” and add a regression in tests/test_guides.py that the ContinuousDiD prerequisite summary still mentions duplicates. diff_diff/guides/llms-autonomous.txt:L330-L343, diff_diff/guides/llms-autonomous.txt:L499-L507, diff_diff/continuous_did.py:L818-L823, diff_diff/profile.py:L69-L75, CHANGELOG.md:L11-L11, ROADMAP.md:L141-L141
Verification note: I could not execute the new tests in this environment.

…ontinuousDiD prerequisite summaries Reviewer correctly noted that the round-2 wording lists `has_never_treated` + `treatment_varies_within_unit == False` + `is_balanced` as the "authoritative" ContinuousDiD pre-fit gates but omits the duplicate-cell hard stop. Verified `continuous_did.py:_precompute_structures` (line 818-823) builds `outcome_matrix` cell-by-cell with last-row-wins on duplicate `(unit, time)` keys - so absence of the `duplicate_unit_time_rows` alert is also a real prerequisite, not just a style preference. Updated wording in five places to add "+ absence of the `duplicate_unit_time_rows` alert" alongside the other gates and explain the silent-overwrite behavior: - `diff_diff/profile.py` `TreatmentDoseShape` docstring - `diff_diff/guides/llms-autonomous.txt` §2 field reference - `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design feature) - `diff_diff/guides/llms-autonomous.txt` §5.2 worked example reasoning chain (now lists four gates instead of three) - `CHANGELOG.md` Unreleased entry - `ROADMAP.md` AI-Agent Track building-block Also softened "authoritative" -> "core field-based" since the non-field-based duplicate-row gate makes the original phrasing slightly misleading. Added a test_guides.py regression asserting the autonomous guide mentions `duplicate_unit_time_rows` so future wording changes can't silently drop the gate from the summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T00:12:18Z

/ai-review

github-actions · 2026-04-25T00:17:05Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 22ff5dc161534d087d0aa191b74f04efd99eaa43

Overall Assessment

⚠️ Needs changes — I found two unmitigated P1 methodology issues in the new count-outcome guidance/routing.

Executive Summary

The prior duplicate-row documentation issue appears resolved: the ContinuousDiD prerequisite summary now keeps duplicate_unit_time_rows in scope.
P1: The new Wooldridge Poisson guidance says the estimator returns a multiplicative/proportional effect, but the implementation and reporting contract expose an ASF-based ATT difference on the outcome scale.
P1: The new OutcomeShape.is_count_like heuristic does not exclude negative support, so it can steer negative integer outcomes toward WooldridgeDiD(method="poisson"), which hard-rejects negative outcomes.
P2: The added tests pin section presence and API spelling, but they do not guard the nonlinear Wooldridge target-parameter wording or the negative-count edge case.
Static review only: I could not run the test suite because pytest is not installed in this environment.

Methodology

Severity: P1. Affected method: WooldridgeDiD(method="poisson"). Impact: The new guide tells agents that Poisson ETWFE “estimates the multiplicative effect directly” and that they should “report the multiplicative effect (proportional change),” but the library’s own methodology contract is ASF-based ATT, and the Poisson path computes E[exp(η_1)] - E[exp(η_0)], i.e. an outcome-scale difference, not a ratio. An agent following the new worked example can therefore misreport overall_att on the wrong scale. Concrete fix: Rewrite the §4.11 / §5.3 wording so Poisson ETWFE is described as an ASF-based ATT on the natural outcome scale; if you want to mention proportional effects, label them as a separate derived interpretation, not the estimator’s overall_att target. Refs: diff_diff/guides/llms-autonomous.txt:L621-L629, diff_diff/guides/llms-autonomous.txt:L801-L811, docs/methodology/REGISTRY.md:L1335-L1357, diff_diff/wooldridge.py:L1191-L1225, diff_diff/_reporting_helpers.py:L262-L281.
Severity: P1. Affected surface: profile_panel outcome-shape routing for WooldridgeDiD(method="poisson"). Impact: is_count_like currently requires integer-valued data, zeros, positive skew, and >2 distinct values, but it does not require non-negative support. That means a right-skewed integer outcome with zeros and some negative values can still set is_count_like=True, after which the new guide treats it as a Poisson candidate even though the Poisson fitter raises on any negative outcome. This is a missing edge-case guard on the new routing signal. Concrete fix: Add a non-negativity requirement to is_count_like (value_min >= 0 or equivalent), document it, and add a regression test with negative integer outcomes plus zeros expecting is_count_like=False. Refs: diff_diff/profile.py:L621-L645, diff_diff/guides/llms-autonomous.txt:L621-L629, diff_diff/wooldridge.py:L1105-L1108.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not see these P1 items mitigated in TODO.md, and they are not deferrable under the review policy.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: The new tests verify the worked-examples section exists and that the Poisson example uses method= rather than family=, but they do not verify the nonlinear Wooldridge estimand wording or the negative-outcome exclusion. Both P1 issues above can therefore pass the current suite. Concrete fix: Add guide-content assertions that §4.11/§5.3 do not describe overall_att as “multiplicative” / “proportional change,” and add a profile_panel regression test for a right-skewed integer outcome with zeros and some negative values. Refs: tests/test_guides.py:L62-L83, tests/test_profile_panel.py:L887-L907.

Path to Approval

Fix the Poisson ETWFE wording in diff_diff/guides/llms-autonomous.txt so it matches the ASF-based ATT contract already documented in REGISTRY.md and _reporting_helpers.py.
Tighten OutcomeShape.is_count_like to require non-negative support before it can steer users toward Poisson.
Add regression tests for both the estimand wording and the negative-integer edge case.

…n estimand wording + is_count_like non-negativity guard P1 #1 (Wooldridge Poisson estimand wording): The guide §4.11 and §5.3 worked example described `WooldridgeDiD(method="poisson")`'s `overall_att` as a "multiplicative effect" / "log-link effect" / "proportional change" to be reported. Verified against `wooldridge.py:1225` (`att = _avg(mu_1 - mu_0, cell_mask)`) and `_reporting_helpers.py:262-281` (registered estimand: "ASF-based average from Wooldridge ETWFE ... average-structural-function (ASF) contrast between treated and counterfactual untreated outcomes ... on the natural outcome scale"): the actual quantity is `E[exp(η_1)] - E[exp(η_0)]`, an outcome-scale DIFFERENCE, not a multiplicative ratio. An agent following the previous wording would misreport the headline scalar. Rewrote both surfaces to: - Describe the estimand as an ASF-based outcome-scale difference, citing `wooldridge.py:1225` and Wooldridge (2023) + REGISTRY.md §WooldridgeDiD nonlinear / ASF path. - Explicitly note the headline `overall_att` is a difference on the natural outcome scale, NOT a multiplicative ratio. - Mention that a proportional / percent-change interpretation can be derived post-hoc as `overall_att / E[Y_0]` but is not the estimator's reported scalar. Added `test_autonomous_count_outcome_uses_asf_outcome_scale_estimand` in `tests/test_guides.py`: extracts §4.11 and §5.3 blocks, asserts forbidden phrases ("multiplicative effect under qmle", "estimates the multiplicative effect", "multiplicative (log-link) effect", "report the multiplicative effect", "report the multiplicative") do NOT appear, and asserts §5.3 explicitly contains "ASF" and "outcome scale" so future edits cannot silently weaken the description. P1 #2 (`is_count_like` non-negativity guard): The `is_count_like` heuristic gated on integer-valued + has-zeros + right-skewed + > 2 distinct values, but did NOT exclude negative support. Verified against `wooldridge.py:1105-1109`: Poisson method hard-rejects `y < 0` with `ValueError`. Without a value_min >= 0 guard, a right-skewed integer outcome with zeros and some negatives would set `is_count_like=True` and steer an agent toward an estimator that then refuses to fit. Added `value_min >= 0.0` to the heuristic and explained the non-negativity gate in the docstring + autonomous guide §2 field reference (now reads "is_integer_valued AND pct_zeros > 0 AND skewness > 0.5 AND n_distinct_values > 2 AND value_min >= 0"). The guide also notes that the gate exists specifically to align the routing signal with WooldridgeDiD Poisson's hard non-negativity requirement. Added `test_outcome_shape_count_like_excludes_negative_support` in `tests/test_profile_panel.py` covering a Poisson-distributed outcome with a small share of negative integers spliced in: asserts `is_count_like=False` despite the other four conditions firing. P2 (test coverage for both P1s): Both regressions above guard the new contracts. The guide test guards the wording surface; the profile test guards the heuristic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T00:24:46Z

/ai-review

github-actions · 2026-04-25T00:31:19Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b444300244662048617cd21fc6357d5cae0b1e32

Overall Assessment

⚠️ Needs changes — the two prior Wooldridge-related P1s look resolved in the provided diff, but there is one P1 [Newly identified] in the new ContinuousDiD pre-fit routing/docs.

Executive Summary

The prior Poisson issues appear fixed in the provided diff: WooldridgeDiD(method="poisson") is now described as ASF/outcome-scale, and outcome_shape.is_count_like now excludes negative outcomes.
P1 [Newly identified]: the new treatment_dose / ContinuousDiD prerequisite summaries omit the estimator’s strictly-positive-treated-dose restriction. A panel can satisfy every documented gate and still hard-fail in ContinuousDiD.fit() on negative dose support.
The added tests cover the two prior Poisson regressions, but there is still no regression around negative-dose ContinuousDiD routing.
No findings in code quality, performance, security, or tracked-tech-debt handling.
Static review only: pytest is not installed in this environment, so I could not run the suite.

Methodology

Re-review note: the previous Poisson estimand-wording and negative-count-routing findings appear resolved in the provided diff.

Severity: P1 [Newly identified]. Impact: the PR now presents has_never_treated, treatment_varies_within_unit == False, is_balanced, and no duplicate_unit_time_rows alert as the authoritative pre-fit gates for ContinuousDiD, and §5.2 says that if those pass the estimator is “in scope.” That omits the implementation’s hard support restriction that treated doses must be strictly positive; ContinuousDiD.fit() raises on negative treated dose support, and the methodology note is written on D > 0 support. An autonomous agent can therefore route a balanced, never-treated, time-invariant panel with treatment_dose.dose_min < 0 into a deterministic fit-time error. Concrete fix: extend the new routing contract so ContinuousDiD is only in scope when nonzero doses are strictly positive as well; at minimum, document treatment_dose.dose_min > 0 as a hard gate, and preferably expose an explicit negative-dose alert/boolean. Refs: diff_diff/profile.py:L65-L85, diff_diff/guides/llms-autonomous.txt:L737-L751, diff_diff/continuous_did.py:L287-L294, docs/methodology/continuous-did.md:L180-L184.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not find the ContinuousDiD gate omission documented as a registry deviation or tracked in TODO.md, so it is unmitigated.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: the new tests now guard the prior Poisson regressions, but they still do not cover the expanded ContinuousDiD routing surface for negative-dose panels. The treatment-dose tests only cover zero-baseline/no-zero/binary/categorical cases, and the guide tests do not assert any negative-dose exclusion. Concrete fix: add a regression with a balanced, never-treated, time-invariant continuous panel containing a negative nonzero dose and assert the routing surface rejects ContinuousDiD (via explicit alert/boolean or guide wording keyed to dose_min < 0). Refs: tests/test_profile_panel.py:L1037-L1187, tests/test_guides.py:L62-L105.

Path to Approval

Update the new TreatmentDoseShape / autonomous-guide prerequisite summaries so ContinuousDiD is only “in scope” when nonzero doses are strictly positive in addition to being zero-supported, time-invariant, balanced, and duplicate-free.
Add a targeted regression for a negative-dose continuous panel that otherwise passes the existing documented gates, so this support restriction is covered going forward.

…ptive-only; fix WooldridgeDiD method kwarg P1 (TreatmentDoseShape vs ContinuousDiD contract): - Reviewer correctly flagged that the new `is_time_invariant` field (per-unit non-zero distinct-count) does NOT match the actual `ContinuousDiD.fit()` gate at `continuous_did.py:222-228`, which uses `df.groupby(unit)[dose].nunique() > 1` over the FULL dose column (including pre-treatment zeros). My nonzero-only check silently classified `0,0,d,d` paths as time-invariant while ContinuousDiD would reject them. - Removed `is_time_invariant` field from `TreatmentDoseShape` entirely. The pre-existing `PanelProfile.treatment_varies_within_unit` field already encodes the correct ContinuousDiD prerequisite (matches the estimator's nunique check at line 224) and is correctly documented in §2 of the autonomous guide. Adding a second, narrower, mismatched gate was confusing - the reviewer's "scope as descriptive-only" path is the cleaner fix. - Reframed `TreatmentDoseShape` docstring + autonomous guide §2 field reference: explicitly NOT a ContinuousDiD prerequisite. `n_distinct_doses`, `has_zero_dose`, `dose_min/max/mean` provide descriptive distributional context; `has_never_treated` (unit-level) + `treatment_varies_within_unit == False` (full-path constancy) + `is_balanced` are the authoritative gates. - Rewrote §5.2 worked example reasoning chain to use the existing correct gates and added a counter-example showing `has_zero_dose=True` does NOT imply `has_never_treated=True` (the row-level vs unit-level distinction). - Added `test_treatment_dose_does_not_gate_continuous_did` covering the two contradictory cases the reviewer named: (1) `0,0,d,d` within-unit dose path, asserting `treatment_varies_within_unit=True` (the actual ContinuousDiD gate fires correctly); (2) row-level zeros without never-treated units, asserting `has_zero_dose=True` BUT `has_never_treated=False` (the two facts are distinct). - Removed `test_treatment_dose_continuous_time_varying_within_unit` and `test_treatment_dose_distinguishes_doses_at_high_precision` - both tested the dropped `is_time_invariant` field. P2 (WooldridgeDiD constructor kwarg): - The autonomous guide §5.3 worked example used `WooldridgeDiD(family="poisson")` but the actual constructor at `wooldridge.py:264` takes `method=`. Following the example would raise `TypeError: __init__() got an unexpected keyword argument 'family'`. Fixed in two places (the prose and the code snippet) and added a negative assertion in `test_guides.py` to prevent regression: `assert 'WooldridgeDiD(family="poisson")' not in text`. CHANGELOG updated to reflect the revised TreatmentDoseShape scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…corrected scope; cover new exports in import-surface test P3 #1 (ROADMAP wording drift): ROADMAP.md still said the new fields "gate WooldridgeDiD QMLE / ContinuousDiD prerequisites pre-fit" and mentioned "time-invariance", which contradicted the round-1 corrections to TreatmentDoseShape's docstring + autonomous guide §2 + §5.2. Reworded to match: the new fields add descriptive distributional context only; `outcome_shape.is_count_like` informs (not gates) the WooldridgeDiD QMLE judgment, and the authoritative ContinuousDiD pre-fit gates remain `has_never_treated`, `treatment_varies_within_unit`, and `is_balanced`. "Time-invariance" wording removed (the field was dropped in round 1). P3 #2 (import-surface test coverage): `test_top_level_import_surface()` previously only verified `profile_panel`, `PanelProfile`, `Alert`. Extended to also cover the two new public exports `OutcomeShape` and `TreatmentDoseShape`, asserting both their importability and their presence in `diff_diff.__all__`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ontinuousDiD prerequisite summaries Reviewer correctly noted that the round-2 wording lists `has_never_treated` + `treatment_varies_within_unit == False` + `is_balanced` as the "authoritative" ContinuousDiD pre-fit gates but omits the duplicate-cell hard stop. Verified `continuous_did.py:_precompute_structures` (line 818-823) builds `outcome_matrix` cell-by-cell with last-row-wins on duplicate `(unit, time)` keys - so absence of the `duplicate_unit_time_rows` alert is also a real prerequisite, not just a style preference. Updated wording in five places to add "+ absence of the `duplicate_unit_time_rows` alert" alongside the other gates and explain the silent-overwrite behavior: - `diff_diff/profile.py` `TreatmentDoseShape` docstring - `diff_diff/guides/llms-autonomous.txt` §2 field reference - `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design feature) - `diff_diff/guides/llms-autonomous.txt` §5.2 worked example reasoning chain (now lists four gates instead of three) - `CHANGELOG.md` Unreleased entry - `ROADMAP.md` AI-Agent Track building-block Also softened "authoritative" -> "core field-based" since the non-field-based duplicate-row gate makes the original phrasing slightly misleading. Added a test_guides.py regression asserting the autonomous guide mentions `duplicate_unit_time_rows` so future wording changes can't silently drop the gate from the summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n estimand wording + is_count_like non-negativity guard P1 #1 (Wooldridge Poisson estimand wording): The guide §4.11 and §5.3 worked example described `WooldridgeDiD(method="poisson")`'s `overall_att` as a "multiplicative effect" / "log-link effect" / "proportional change" to be reported. Verified against `wooldridge.py:1225` (`att = _avg(mu_1 - mu_0, cell_mask)`) and `_reporting_helpers.py:262-281` (registered estimand: "ASF-based average from Wooldridge ETWFE ... average-structural-function (ASF) contrast between treated and counterfactual untreated outcomes ... on the natural outcome scale"): the actual quantity is `E[exp(η_1)] - E[exp(η_0)]`, an outcome-scale DIFFERENCE, not a multiplicative ratio. An agent following the previous wording would misreport the headline scalar. Rewrote both surfaces to: - Describe the estimand as an ASF-based outcome-scale difference, citing `wooldridge.py:1225` and Wooldridge (2023) + REGISTRY.md §WooldridgeDiD nonlinear / ASF path. - Explicitly note the headline `overall_att` is a difference on the natural outcome scale, NOT a multiplicative ratio. - Mention that a proportional / percent-change interpretation can be derived post-hoc as `overall_att / E[Y_0]` but is not the estimator's reported scalar. Added `test_autonomous_count_outcome_uses_asf_outcome_scale_estimand` in `tests/test_guides.py`: extracts §4.11 and §5.3 blocks, asserts forbidden phrases ("multiplicative effect under qmle", "estimates the multiplicative effect", "multiplicative (log-link) effect", "report the multiplicative effect", "report the multiplicative") do NOT appear, and asserts §5.3 explicitly contains "ASF" and "outcome scale" so future edits cannot silently weaken the description. P1 #2 (`is_count_like` non-negativity guard): The `is_count_like` heuristic gated on integer-valued + has-zeros + right-skewed + > 2 distinct values, but did NOT exclude negative support. Verified against `wooldridge.py:1105-1109`: Poisson method hard-rejects `y < 0` with `ValueError`. Without a value_min >= 0 guard, a right-skewed integer outcome with zeros and some negatives would set `is_count_like=True` and steer an agent toward an estimator that then refuses to fit. Added `value_min >= 0.0` to the heuristic and explained the non-negativity gate in the docstring + autonomous guide §2 field reference (now reads "is_integer_valued AND pct_zeros > 0 AND skewness > 0.5 AND n_distinct_values > 2 AND value_min >= 0"). The guide also notes that the gate exists specifically to align the routing signal with WooldridgeDiD Poisson's hard non-negativity requirement. Added `test_outcome_shape_count_like_excludes_negative_support` in `tests/test_profile_panel.py` covering a Poisson-distributed outcome with a small share of negative integers spliced in: asserts `is_count_like=False` despite the other four conditions firing. P2 (test coverage for both P1s): Both regressions above guard the new contracts. The guide test guards the wording surface; the profile test guards the heuristic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e-dose gate to ContinuousDiD prerequisite summaries P1 (newly identified — `dose_min > 0` ContinuousDiD gate omitted): Reviewer correctly noted that the round-3 prerequisite summary (`has_never_treated`, `treatment_varies_within_unit == False`, `is_balanced`, no `duplicate_unit_time_rows` alert) omits the estimator's strictly-positive-treated-dose restriction. Verified `continuous_did.py:287-294` raises `ValueError` ("Dose must be strictly positive for treated units (D > 0)") on negative treated dose support — a panel can satisfy every documented gate above and still hard-fail at fit time when `treatment_dose.dose_min < 0`. Updated wording in five surfaces to add `treatment_dose.dose_min > 0` as the fifth pre-fit gate: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells out all five gates with numbered list referencing the `continuous_did.py:287-294` line range). - `diff_diff/guides/llms-autonomous.txt` §2 field reference (notes `dose_min > 0` is itself a gate; the other treatment_dose sub-fields remain descriptive). - `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design feature paragraph). - `diff_diff/guides/llms-autonomous.txt` §5.2 worked example reasoning chain (now five gates checked; added a new counter-example covering the negative-dose path so an agent reading the example sees the contradictory case explicitly). - `CHANGELOG.md` Unreleased entry. - `ROADMAP.md` AI-Agent Track building-block. P2 (regression test for negative-dose path): Added `test_treatment_dose_min_flags_negative_dose_continuous_panels` in `tests/test_profile_panel.py` covering a balanced, never-treated, time-invariant continuous panel where every dose level is negative. The test asserts that all four other gates pass cleanly and that `dose.dose_min < 0` is correctly observed — the fixture an agent would see when reasoning about whether ContinuousDiD applies. Added `dose_min > 0` content-stability assertion in `tests/test_guides.py` so future wording changes can't silently drop the gate from the autonomous-guide summary. Rebased onto origin/main; resolved CHANGELOG conflicts (kept HAD Phase 4.5 B + dCDH by_path Wave 2 + my Wave 2 entries side by side). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T00:41:13Z

/ai-review

github-actions · 2026-04-25T00:46:42Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ed57c7cffa81eec1685695f3ad11894729e41372

Overall Assessment

⚠️ Needs changes — the prior negative-dose / Wooldridge Poisson wording issues look resolved, but there is one unmitigated P1 in the new ContinuousDiD routing contract.

Executive Summary

The earlier dose_min > 0 omission appears fixed: the guide now reflects ContinuousDiD’s negative-dose rejection, and the Wooldridge Poisson text now matches the ASF outcome-scale estimand.
P1 [Newly identified]: the PR repeatedly calls a five-item PanelProfile checklist the full ContinuousDiD pre-fit gate set, but ContinuousDiD.fit() still has separate first_treat validity/consistency gates and still requires P(D=0) > 0 regardless of control_group.
The new OutcomeShape / is_count_like heuristic is aligned with the Poisson fitter’s non-negativity requirement and does not look methodologically off.
Static review only: pytest is not installed in this environment, so I could not execute the added tests.

Methodology

Severity: P1 [Newly identified]. Impact: the new docs present has_never_treated, treatment_varies_within_unit == False, is_balanced, no duplicate_unit_time_rows, and treatment_dose.dose_min > 0 as the full ContinuousDiD gate set. That is not the estimator’s full contract. ContinuousDiD.fit() also validates the separate first_treat column, rejects NaN/negative values, drops first_treat > 0 units with dose == 0, zeroes first_treat == 0 rows with nonzero dose, and enforces P(D=0) > 0 even under control_group="not_yet_treated". An agent can therefore mark a panel as “in scope” from PanelProfile alone even though the actual first_treat input will deterministically warn/error or change the sample. Concrete fix: stop calling the five profile-derived facts the “full” gate set; describe them as the profile-side screening surface and add an explicit caveat that ContinuousDiD.fit() still requires a valid/consistent first_treat column. Refs: diff_diff/guides/llms-autonomous.txt:L203-L235, diff_diff/guides/llms-autonomous.txt:L514-L526, diff_diff/guides/llms-autonomous.txt:L758-L788, diff_diff/profile.py:L65-L99, diff_diff/continuous_did.py:L230-L360, docs/methodology/REGISTRY.md:L719-L745, docs/methodology/continuous-did.md:L182-L184, ROADMAP.md:L141, CHANGELOG.md:L12.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P3. Impact: the new internal commentary is inconsistent. Public docs now say treatment_dose.dose_min > 0 is part of the profile-side ContinuousDiD screen, but _compute_treatment_dose() and test_treatment_dose_does_not_gate_continuous_did() still say treatment-dose fields “do not gate” ContinuousDiD. That inconsistency will make future regressions more likely. Concrete fix: align the helper/test docstrings with the public contract (“most fields descriptive; dose_min > 0 is the extra screen”). Refs: diff_diff/profile.py:L65-L99, diff_diff/profile.py:L689-L694, tests/test_profile_panel.py:L1110-L1128.

Tech Debt

No findings. The P1 above is not mitigated by TODO.md tracking or a registry deviation note.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: the new tests pin the negative-dose wording and ASF outcome-scale wording, but they do not protect the missing first_treat caveat that causes the P1 above. Concrete fix: add a guide regression asserting that the ContinuousDiD section describes the five PanelProfile facts as screening checks only and explicitly mentions separate first_treat validation/consistency. Refs: tests/test_guides.py:L37-L69, diff_diff/guides/llms-autonomous.txt:L758-L788, diff_diff/continuous_did.py:L230-L360.

Path to Approval

Replace every “full ContinuousDiD pre-fit gate set” claim in CHANGELOG.md, ROADMAP.md, diff_diff/profile.py, and diff_diff/guides/llms-autonomous.txt with “profile-side screening checks,” and document the separate first_treat contract from ContinuousDiD.fit().
In llms-autonomous.txt §5.2, rewrite the has_never_treated == False counterexample so it states that the current implementation requires P(D=0) > 0 under both control_group options, not just the default.
Add a regression in tests/test_guides.py that requires the ContinuousDiD section to mention separate first_treat validation/consistency; optionally pair it with a small tests/test_continuous_did.py scenario where the five profile-side checks pass but inconsistent first_treat still warns/errors.

…nuousDiD prerequisite list as profile-side screening + add first_treat caveat P1 (the five profile-derived facts are not the "full" gate set): Reviewer correctly noted that calling `{has_never_treated, treatment_varies_within_unit==False, is_balanced, no duplicate_unit_time_rows alert, dose_min > 0}` the "full ContinuousDiD pre-fit gate set" overreaches. `profile_panel` only sees the four columns it accepts and CANNOT see the separate `first_treat` column that `ContinuousDiD.fit()` consumes. Verified against `continuous_did.py:230-360`: `fit()` additionally rejects NaN/inf/negative `first_treat`, drops units with `first_treat > 0` AND `dose == 0`, and force-zeroes `first_treat == 0` rows whose `dose != 0` with a `UserWarning`. A panel that passes all five profile-side checks can still surface warnings, drop rows, or raise at fit time depending on the `first_treat` column the caller supplies. Reframed the wording in five surfaces from "full gate set" to "profile-side screening checks" with an explicit caveat that the checks are necessary-but-not-sufficient and that `ContinuousDiD.fit()` applies separate `first_treat` validation: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells out the screening framing explicitly + lists the `first_treat` validations that fit() applies). - `diff_diff/profile.py` `_compute_treatment_dose` helper docstring (aligned with public contract: most fields descriptive, `dose_min > 0` is one of the screening checks). - `diff_diff/guides/llms-autonomous.txt` §2 field reference (rewrote the multi-paragraph block to describe screening + first_treat caveat). - `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design feature paragraph: screening checks + necessary-not-sufficient language + pointer to §2). - `diff_diff/guides/llms-autonomous.txt` §5.2 worked example reasoning chain (rewrote step 2 to call out screening + first_treat caveat; clarified counter-example #4 that `P(D=0) > 0` is required under BOTH `control_group="never_treated"` and `"not_yet_treated"`, not just default). - `CHANGELOG.md` Unreleased entry. - `ROADMAP.md` AI-Agent Track. P2 (test coverage for the missing `first_treat` caveat): Added a content-stability assertion in `tests/test_guides.py`: `assert "first_treat" in text` so the autonomous guide cannot silently drop the explicit `first_treat` validation caveat. P3 (helper / test-name inconsistency with public contract): Renamed `test_treatment_dose_does_not_gate_continuous_did` to `test_treatment_dose_descriptive_fields_supplement_existing_gates` and rewrote its docstring to match the now-honest public contract ("most fields descriptive distributional context that supplements the existing top-level screening checks"). The test body still asserts the same two things — `treatment_varies_within_unit` fires True on `0,0,d,d` paths and `has_never_treated` is independent of `has_zero_dose` — both of which remain accurate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T00:50:59Z

/ai-review

github-actions · 2026-04-25T00:56:53Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 8601b611fa2c21684b91ecb58bf4b5df314557d6

Overall Assessment

Looks good — the prior P1 is resolved, and I did not find any new unmitigated P0/P1 issues in the changed code/docs.

Executive Summary

Prior re-review P1 is resolved: the ContinuousDiD language now consistently frames the five PanelProfile facts as a profile-side pre-flight screen and separately documents first_treat validation/coercion/drop behavior (diff_diff/profile.py:L65-L115, diff_diff/guides/llms-autonomous.txt:L203-L250, diff_diff/guides/llms-autonomous.txt:L530-L548, diff_diff/guides/llms-autonomous.txt:L780-L812, CHANGELOG.md:L12, ROADMAP.md:L141).
The new OutcomeShape.is_count_like routing is aligned with Wooldridge Poisson fit behavior by excluding negative-support outcomes, and the guide now correctly describes Poisson overall_att as an ASF-based outcome-scale difference (diff_diff/profile.py:L669-L683, docs/methodology/REGISTRY.md:L1333-L1348, diff_diff/wooldridge.py:L1102-L1109, diff_diff/wooldridge.py:L1191-L1245).
No estimator math, weighting, variance/SE, identification assumptions, or defaults changed in the implementation; this PR is limited to descriptive profiling fields, exports, guide content, and tests.
One minor documentation drift remains: the changelog summary omits the non-negativity clause from the is_count_like heuristic (CHANGELOG.md:L12).
Added tests cover the new profile fields and the prior review regression surface (tests/test_guides.py:L37-L185, tests/test_profile_panel.py:L889-L1313), but I could not execute them here because pytest and runtime deps such as numpy are unavailable.

Methodology

No findings. The changed docs now match the ContinuousDiD contract in docs/methodology/REGISTRY.md:L717-L745 and diff_diff/continuous_did.py:L222-L327,L348-L360, and the Wooldridge Poisson wording matches docs/methodology/REGISTRY.md:L1333-L1348 and diff_diff/wooldridge.py:L1191-L1245.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: CHANGELOG.md:L12 describes is_count_like without the value_min >= 0 / non-negative-support gate. Readers relying on the release notes alone may expect negative-valued outcomes to route as count-like even though the implementation intentionally suppresses that to stay compatible with WooldridgeDiD(method="poisson").
Concrete fix: Update the changelog bullet to include the non-negativity clause, matching diff_diff/profile.py:L669-L683, diff_diff/guides/llms-autonomous.txt:L188-L198, and tests/test_profile_panel.py:L930-L971.

…se in CHANGELOG is_count_like description Reviewer correctly noted that the CHANGELOG bullet describing `is_count_like` listed only four of the five conditions (integer-valued + has zeros + right-skewed + > 2 distinct values) but omitted the `value_min >= 0` non-negativity clause added in round 4. Readers of the release notes alone would have expected negative-valued count-like outcomes to route as `is_count_like=True` even though the implementation intentionally suppresses that to stay compatible with `WooldridgeDiD(method="poisson")`'s hard non-negative requirement (`wooldridge.py:1105-1109`). Updated the bullet to include the non-negativity clause and explicitly cite the wooldridge.py line range so the release-notes description matches `diff_diff/profile.py`, `diff_diff/guides/llms-autonomous.txt`, and the `test_outcome_shape_count_like_excludes_negative_support` regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T00:58:26Z

/ai-review

github-actions · 2026-04-25T01:04:02Z

🔁 AI review rerun (requested by @igerber)

Head SHA: bc13abb50011b5a5495d6448d458cff7dfa157e4

Overall Assessment

✅ Looks good

Executive Summary

Re-review status: the prior changelog drift is resolved; I did not find any new unmitigated P0/P1 issues in the changed files.
Methodology check passed for the affected guidance surfaces: the new ContinuousDiD profile-side screening language matches the estimator contract on zero-dose controls, full-path dose constancy, balanced-panel requirement, duplicate-row caveat, negative-dose rejection, and separate first_treat validation (diff_diff/guides/llms-autonomous.txt:L203-L251, diff_diff/guides/llms-autonomous.txt:L507-L548, diff_diff/continuous_did.py:L222-L360, docs/methodology/REGISTRY.md:L717-L745).
Methodology check passed for the Wooldridge Poisson path: OutcomeShape.is_count_like now gates on non-negative support, and the guide correctly describes overall_att as an ASF-based outcome-scale difference rather than a multiplicative ratio (diff_diff/profile.py:L631-L695, diff_diff/wooldridge.py:L1102-L1278, docs/methodology/REGISTRY.md:L1333-L1349).
No estimator math, weighting, variance/SE, identification assumptions, or defaults changed; the implementation is confined to descriptive profile fields, public exports, guide content, and tests (diff_diff/profile.py:L43-L442, diff_diff/init.py:L253-L259, diff_diff/init.py:L503-L508).
One minor docs inconsistency remains: §4.7 still opens with “Three eligibility prerequisites” for ContinuousDiD, then later in the same section correctly expands the profile-side screen to five checks by adding duplicate-row absence and treatment_dose.dose_min > 0 (diff_diff/guides/llms-autonomous.txt:L507-L548).
I could not execute the added tests here because this environment lacks pytest and runtime dependencies such as pandas and numpy.

Methodology

No findings. Cross-checking the changed profile/guide language against docs/methodology/REGISTRY.md:L717-L745 and docs/methodology/REGISTRY.md:L1333-L1349 found no undocumented deviation in estimator behavior.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: §4.7 contains two different summaries of the ContinuousDiD pre-fit contract. The opening bullet says there are only three eligibility prerequisites, while the later paragraph correctly adds the duplicate-row hard stop and the negative-dose screen. A reader who stops at the first summary can miss two load-bearing gates and route a duplicate-containing or negative-dose panel into a failed or silent-overwrite fit path (diff_diff/guides/llms-autonomous.txt:L507-L548, diff_diff/continuous_did.py:L222-L338, diff_diff/continuous_did.py:L800-L823).
Concrete fix: Update the opening ContinuousDiD bullet in §4.7 to enumerate all five profile-side checks, or relabel the first three as “core estimator prerequisites” and explicitly cross-reference the fuller five-check screen immediately below.

Static review only: the added coverage in tests/test_profile_panel.py:L887-L1313 and tests/test_guides.py:L37-L182 looks appropriate, but I could not run it in this environment.

…D opening bullet with the five-check screen below it §4.7 had two summaries of the ContinuousDiD pre-fit contract that disagreed: the opening bullet (Wave 1 era) said "Three eligibility prerequisites", while the paragraph immediately below correctly listed all five profile-side screening checks (Wave 2 added the duplicate-row hard stop and the negative-dose `dose_min > 0` screen). A reader who stopped at the opening bullet could miss those two load-bearing gates and route a duplicate-containing or negative-dose panel into a failed/silent-overwrite fit path. Updated the opening bullet to enumerate all five profile-side checks: (a) zero-dose controls / `P(D=0) > 0`, (b) per-unit time-invariant dose, (c) balanced panel, (d) no `duplicate_unit_time_rows` alert, (e) strictly positive treated doses. Also clarified that the duplicate-row case (d) coerces silently rather than raising — distinct from (a)/(b)/(c)/(e) which all raise ValueError. Pointer to the paragraph below + §2 for the `first_treat` validation surface kept intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T01:06:51Z

/ai-review

github-actions · 2026-04-25T01:12:01Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a8880b46a47e39fb3c89a60718cc9b80a7bc2994

Overall Assessment

✅ Looks good

Executive Summary

Prior re-review issue is resolved: the autonomous guide now consistently names all five ContinuousDiD profile-side checks and the separate first_treat validation surface.
No estimator implementation, weighting, variance/SE computation, identification assumption, or default behavior changed in this PR; the code changes are confined to descriptive PanelProfile metadata, exports, guide prose, and tests.
One methodology-facing docs issue remains: the new count-outcome guidance justifies Poisson QMLE by saying linear DiD asymptotic SEs assume “normal-shaped errors,” which does not match the library’s documented cluster-robust OLS inference contract.
One minor docs clarity issue remains: the new ContinuousDiD worked example is written like a zero-pre/post-dose adoption path even though the estimator, the profile gate, and the same section later reject 0,0,d,d dose paths.
Static review only: I could not run the added tests here because this environment does not have pytest, numpy, or pandas.

Methodology

Severity: P2. Impact: The new count-outcome routing text tells agents that linear DiD SEs are questionable because they “assume normal-shaped errors,” then uses that as the rationale for preferring WooldridgeDiD(method="poisson"). That does not match the library’s documented inference surface, which is cluster-robust sandwich OLS for linear ETWFE and QMLE sandwich for nonlinear ETWFE; the current wording can steer estimator choice for the wrong reason even though the Poisson path itself is valid. Location: diff_diff/guides/llms-autonomous.txt:L188-L197, diff_diff/guides/llms-autonomous.txt:L666-L680, diff_diff/guides/llms-autonomous.txt:L882-L907, docs/methodology/REGISTRY.md:L1343-L1349. Concrete fix: rewrite the rationale in support/functional-form terms (or efficiency/model-fit terms) rather than claiming robust linear DiD inference depends on normal-shaped errors.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: The new ContinuousDiD worked example is titled “zero baseline” and describes “block-style adoption in year 3,” but the shown profile has treatment_varies_within_unit=False, and the same section later correctly says a 0,0,d,d path is out of scope. That mixed framing can confuse an agent about the estimator’s full-path dose-invariance contract. Location: diff_diff/guides/llms-autonomous.txt:L759-L781, diff_diff/guides/llms-autonomous.txt:L822-L828, diff_diff/continuous_did.py:L222-L228. Concrete fix: rename the example to something like “continuous-dose panel with zero-dose controls,” and state explicitly that positive-dose units keep the same dose in every observed period while first_treat carries the timing information separately.

No additional test-coverage findings in the diff. Testing note: I could not execute tests/test_profile_panel.py or tests/test_guides.py in this environment because the required Python packages are not installed.

…ationale in functional-form terms; rename §5.2 + clarify dose constancy P2 (linear-OLS SE rationale wording): The §4.11 and §5.3 prose justified `WooldridgeDiD(method="poisson")` over linear-OLS DiD by claiming linear-OLS asymptotic SEs "assume normal-shaped errors" that a right-skewed count distribution violates. That misrepresents the library's documented inference contract: linear ETWFE uses cluster-robust sandwich SEs (`OLS path` in REGISTRY.md) which are valid for any error distribution with finite moments — not just Gaussian. Rewrote both surfaces in functional-form / efficiency terms: - Linear-OLS DiD imposes an additive functional form on a non-negative count outcome. SEs are calibrated (cluster-robust) but the linear model can be inefficient and may produce counterfactual predictions outside the non-negative support. - `WooldridgeDiD(method="poisson")` imposes a multiplicative (log-link) functional form that respects non-negativity and matches the typical generative process for count data; QMLE sandwich SEs are robust to distributional misspecification. - The choice is about WHICH functional form best summarizes the treatment effect (additive vs multiplicative), not about whether SEs are calibrated. §5.3 reasoning chain step 2 + step 3 rewritten to reflect this: both estimators are now described as having calibrated inference; the trade-off is parameterization (linear ATT vs. ASF outcome-scale ATT). The Poisson non-negativity gate explanation is preserved. P3 (§5.2 example self-contradiction): §5.2 was titled "Continuous-dose panel with zero baseline" with prose "block-style adoption in year 3", which suggested a `0,0,d,d` within-unit dose path — but the shown profile has `treatment_varies_within_unit=False` (per-unit constant dose) and the same section's later counter-example correctly says `0,0,d,d` is out of scope for ContinuousDiD. Self-contradictory framing. Renamed to "Continuous-dose panel with zero-dose controls" (per reviewer suggestion) and clarified the prose: each dose group holds its assigned dose value (including 0 for the never-treated controls) in every observed period; adoption timing is carried separately via the `first_treat` column passed to `ContinuousDiD.fit()`, not via within-unit dose variation. Updated the matching `test_autonomous_contains_worked_examples_section` assertion in `tests/test_guides.py` to track the new title. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tcome rationale to functional-form framing Round 9 rewrote §4.11 detailed bullet and §5.3 reasoning chain to describe the Poisson-vs-linear choice in functional-form / efficiency terms, but two surfaces still carried the old SE-based framing: - §2 `is_count_like` field reference: "questionable asymptotic SEs" - §4.11 preamble paragraph: "asymptotic SEs may mislead" Both phrases conflict with the library's documented inference contract (REGISTRY.md §WooldridgeDiD: linear ETWFE uses cluster-robust sandwich SEs which are valid for any error distribution with finite moments — not just Gaussian). Reviewer correctly flagged this as the remaining methodology P2. Updated both surfaces to match the round-9 framing: - Linear-OLS DiD imposes an additive functional form on a non-negative count outcome; cluster-robust SEs remain calibrated but the linear model can be inefficient and may produce counterfactual predictions outside the non-negative support. - `WooldridgeDiD(method="poisson")` is the multiplicative (log-link) ETWFE alternative that respects the non-negative support and matches the typical count-process generative model, with QMLE sandwich SEs robust to distributional misspecification. - The choice is functional-form / efficiency / support, NOT SE calibration. Verified no remaining "asymptotic SEs" / "questionable" / "mislead" phrases in the guide via grep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…light checks as standard-workflow predictions, not estimator gates Reviewer correctly noted that calling {has_never_treated, treatment_varies_within_unit==False, is_balanced, no duplicate_unit_time_rows alert, dose_min > 0} the "screening checks" / "necessary" gates of `ContinuousDiD` overstates the contract. `ContinuousDiD.fit()` keys off the separate `first_treat` column (which `profile_panel` does not see), defines never-treated controls as `first_treat == 0` rows, force-zeroes nonzero `dose` on those rows with a `UserWarning`, and rejects negative dose only among treated units `first_treat > 0` (see `continuous_did.py:276-327` and `:348-360`). Two of the five checks (`has_never_treated`, `dose_min > 0`) are first_treat-dependent: agents who relabel positive- or negative-dose units as `first_treat == 0` trigger the force-zero coercion path with a `UserWarning` and may still fit panels that fail those preflights, with the methodology shifting. The other three (`treatment_varies_within_unit`, `is_balanced`, duplicate-row absence) are real fit-time gates that hold regardless of how `first_treat` is constructed. Reframed every wording site to call these "standard-workflow preflight checks" — predictive when the agent derives `first_treat` from the same dose column passed to `profile_panel`, but not the estimator's literal contract: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (rewrote the multi-paragraph block; explicit standard-workflow definition + per-check first_treat dependency map + force-zero coercion caveat). - `diff_diff/profile.py` `_compute_treatment_dose` helper docstring (already brief; stays consistent). - `diff_diff/guides/llms-autonomous.txt` §2 field reference (long rewrite covering the standard-workflow framing + override paths). - `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet + trailing paragraph (both updated; opening bullet now spells out which of the five checks are first_treat-dependent vs. hard fit-time stops; trailing paragraph promotes the standard- workflow framing). - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain step 2 (rewrote the gate-checking paragraph; counter-example #4 expanded to enumerate (a) supply matching first_treat and accept rejection, (b) deliberate relabel + coercion, (c) different estimator; counter-example #5 distinguishes negative-dose treated-unit rejection from never-treated coercion). - `CHANGELOG.md` Wave 2 entry (matches the new framing). - `ROADMAP.md` AI-Agent Track building block (matches). Test coverage: - Renamed assertion messages in `test_treatment_dose_descriptive_fields_supplement_existing_gates` and `test_treatment_dose_min_flags_negative_dose_continuous_panels` to remove "authoritative gate" phrasing; reframed as "standard- workflow preflight" assertions consistent with the corrected docs. - Added `test_negative_dose_on_never_treated_coerces_not_rejects` in `tests/test_continuous_did.py::TestEdgeCases` covering the reviewer's specific request: never-treated rows with NEGATIVE nonzero dose must coerce (with `UserWarning`) rather than raise the treated-unit negative-dose error. Sister to the existing `test_nonzero_dose_on_never_treated_warns` which covers the positive-dose case. Rebased onto origin/main during this round (no conflicts beyond prior CHANGELOG resolutions; main advanced 19 commits). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T10:53:04Z

/ai-review

github-actions · 2026-04-25T10:58:33Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 25a1197a1d7f7f73fff10107bb5e49366c3cceb9

Overall Assessment

⚠️ Needs changes

Executive Summary

Re-review focus: the earlier ContinuousDiD “authoritative gate” problem is mostly fixed. The new text now correctly treats has_never_treated and dose_min > 0 as standard-workflow, first_treat-dependent preflight checks rather than universal fit-time gates.
No estimator math, weighting, variance/SE, or default-fit behavior changed in this diff.
P1 The new agent-facing docs now go beyond the documented coercion note and present deliberate first_treat == 0 relabeling of nonzero-dose units as an agent option to “manufacture controls” / still fit ContinuousDiD. That is not supported by the Methodology Registry or the estimator’s own requirement that actual P(D=0) > 0 exist.
P2 Some new wording also says duplicate-row failures are “hard fit-time stops,” but ContinuousDiD still silently applies last-row-wins on duplicate (unit, time) cells.
Static review only: python -m pytest tests/test_profile_panel.py tests/test_guides.py tests/test_continuous_did.py -q could not be run because pytest is not installed.

Methodology

P1 Impact: the prior mismatch is not fully closed because the new docs treat intentional relabeling of positive-dose / nonzero-dose units to first_treat == 0 as an available fallback for ContinuousDiD (diff_diff/guides/llms-autonomous.txt:L241-L247, diff_diff/guides/llms-autonomous.txt:L893-L900, diff_diff/profile.py:L88-L97, ROADMAP.md:L141-L141, CHANGELOG.md:L13-L13). The registry and implementation only document this as inconsistent-input coercion with a warning, while still requiring actual P(D=0) > 0 because Remark 3.1 is unimplemented (docs/methodology/REGISTRY.md:L719-L745, diff_diff/continuous_did.py:L319-L360). An autonomous agent following the worked example could therefore alter the data to create synthetic controls without any documented methodological basis. Concrete fix: remove/rewrite the “relabel to first_treat == 0” fallback everywhere; keep the coercion note as implementation behavior for bad/inconsistent inputs, not as a valid routing option. When has_never_treated == False or dose_min <= 0, direct users to re-encode treatment or choose a different estimator unless REGISTRY is updated with a documented deviation.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings beyond the documentation inconsistency noted below.

Tech Debt

No findings. I did not see this issue tracked in TODO.md.

Security

No findings.

Documentation/Tests

P2 Impact: some new text now says the duplicate-row check is a “hard fit-time stop” (diff_diff/profile.py:L117-L122, diff_diff/guides/llms-autonomous.txt:L864-L866), but the estimator still silently overwrites duplicate (unit, time) rows in _precompute_structures() (diff_diff/continuous_did.py:L820-L823). That contradiction can make readers rely on fit-time protection that does not exist. Concrete fix: change this wording to “hard preflight veto / must deduplicate before fit because ContinuousDiD otherwise uses last-row-wins,” matching the actual implementation and the other accurate duplicate-row warnings already present in the guide.
Static note: tests were not executed here because pytest is unavailable in the environment.

Path to Approval

Remove the new “manufacture controls / relabel to first_treat == 0” fallback from the guide, profile docstring, changelog, and roadmap, and replace it with wording that matches REGISTRY.md and ContinuousDiD.fit() exactly.
Update the §5.2 continuous-dose worked example and related prose so the no-never-treated / negative-dose counterexamples route agents to re-encoding or alternative estimators, not deliberate relabeling.
Add or update guide/profile regression tests so they lock in the corrected wording and do not normalize relabeling as an admissible methodology.

…s-fallback wording; correct duplicate-row "fit-time stop" claim P1 (relabel-to-manufacture-controls misframing): Round 11 introduced wording across the guide, profile docstring, CHANGELOG, ROADMAP, and test docstrings that presented intentional `first_treat == 0` relabeling of nonzero-dose units as an "option" / "fallback" for fitting `ContinuousDiD` when the profile-side preflights (`has_never_treated`, `dose_min > 0`) fail. REGISTRY does not document this as a routing option, and the estimator still requires actual `P(D=0) > 0` because Remark 3.1 lowest-dose-as-control is not yet implemented. The force-zero coercion at `continuous_did.py:311-327` is implementation behavior for INCONSISTENT inputs (e.g., user accidentally passes nonzero dose on a never-treated row), not a methodology fallback. Reworded every site to remove the relabeling-as-option framing and replace it with the registry-documented fixes when (1) or (5) fails: re-encode the treatment column to a non-negative scale that contains a true never-treated group, or route to a different estimator (`HeterogeneousAdoptionDiD` for graded-adoption panels; linear DiD with the treatment as a continuous covariate). Every remaining "manufacture controls" mention in the guide, profile, and tests is now an explicit anti-recommendation ("do not relabel ... to manufacture controls"). Updated: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (item (1): "not an opportunity to relabel ..."; item (5): coercion is "implementation behavior for inconsistent inputs, not a methodological fallback"). - `diff_diff/guides/llms-autonomous.txt` §2 field reference (the When-(1)-or-(5)-fails paragraph names re-encode + alternative estimator only; explicit anti-relabel warning). - `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet + trailing paragraph (consolidated; opening bullet drops the relabel-as-fallback framing; trailing paragraph trimmed to a pointer to §2). - `diff_diff/guides/llms-autonomous.txt` §5.2 step 2 + counter- example #4 + counter-example #5 (relabel-as-option language removed; explicit "do not relabel" callouts; counter-example #4 options trimmed to (a) re-encode and (b) different estimator). - `CHANGELOG.md` (relabel-as-option clause removed; replaced with re-encode / different-estimator framing). - `ROADMAP.md` (same). - `tests/test_profile_panel.py` two test docstrings (relabel-as- workflow language removed). P2 (duplicate-row "hard fit-time stop" misclaim): Round 11 wording said "duplicate-row failures are hard fit-time stops" — incorrect. `_precompute_structures` at `continuous_did.py:818-823` silently overwrites with last-row-wins, no exception raised. Reworded as "hard preflight veto: the agent must deduplicate before fit because `ContinuousDiD` otherwise uses last-row-wins, no fit-time exception" in profile.py docstring, guide §4.7 opening bullet, and §5.2 step 2 (now defers to §2 for the breakdown). The previously-correct §2 description of the silent-coerce path is preserved. Length housekeeping: The round-11 round-12 expansion pushed `llms-autonomous.txt` above `llms-full.txt`, breaking `test_full_is_largest`. Trimmed ~2.7KB by consolidating the §4.7 trailing paragraph + §5.2 step 2 trailing block to point at §2's full breakdown rather than duplicating the per-check semantics. autonomous: 65364 chars, full: 66058 chars. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T11:05:29Z

/ai-review

github-actions · 2026-04-25T11:11:55Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 307f424c9580d2739ab4b05313faaec4bb3b6f2e

Overall Assessment

⚠️ Needs changes

Executive Summary

Re-review: the earlier user-facing “manufacture controls” guidance appears fixed in the guide/profile prose.
P1 The replacement ContinuousDiD preflight story is still wrong: it tells agents to derive first_treat from the same dose column that profile_panel() inspects, but the method/library contract uses a time-invariant unit dose plus a separate timing variable.
The contradiction is internal to the new §5.2 example itself: it first describes constant per-unit dose with separate first_treat, then bases the screening logic on inferring first_treat from that same dose column.
The same assumption is now duplicated in diff_diff/profile.py and new tests/test_profile_panel.py regression prose, so the mismatch will persist unless those surfaces are corrected together.
No estimator math, weighting, variance, or inference code changed in this diff.
Static review only: this environment lacks numpy/pytest, so I could not execute the targeted test files.

Methodology

P1 Impact: the new “standard workflow” in diff_diff/guides/llms-autonomous.txt:L219-L239, diff_diff/guides/llms-autonomous.txt:L530-L563, diff_diff/guides/llms-autonomous.txt:L801-L852, diff_diff/profile.py:L72-L130, and tests/test_profile_panel.py:L1111-L1199 says agents can derive first_treat from the same dose column that profile_panel() sees. That conflicts with the method contract: multi-period continuous DiD uses a time-invariant unit dose D_i and a separate timing group G_i/first_treat (docs/methodology/continuous-did.md:L65-L73), ContinuousDiD.fit() hard-rejects within-unit dose variation (diff_diff/continuous_did.py:L222-L228), and the library’s own generator emits constant dose with separate first_treat (diff_diff/prep_dgp.py:L970-L993). REGISTRY documents the untreated-group and balanced-panel requirements, but not any “derive timing from dose” workflow (docs/methodology/REGISTRY.md:L717-L723). An agent following the new guidance must either build a 0,0,d,d path to recover timing, which fit() rejects, or keep a valid constant-dose panel, in which case the dose column cannot identify first_treat at all. Concrete fix: remove the “derive first_treat from the same dose column” workflow from the guide/profile text and replace it with the actual contract: profile_panel() on the dose column alone can only provide descriptive dose-support checks; timing-dependent ContinuousDiD eligibility requires a separate first_treat column already supplied by the caller, or a new helper that validates dose and first_treat together.

Code Quality

No findings.

Performance

No findings.

Maintainability

P2 Impact: PanelProfile is a public top-level type (diff_diff/__init__.py:L503-L506), but the PR adds two new required dataclass fields without defaults (diff_diff/profile.py:L179-L188). Any direct caller that instantiates PanelProfile(...) manually will now fail with TypeError, even though both additions are semantically optional. Concrete fix: make outcome_shape and treatment_dose backward-compatible additions, e.g. keyword-only fields with defaults, and add a regression test for direct construction.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P3 Impact: the new docstring in tests/test_continuous_did.py:L903-L911 still says the coercion path “lets agents legally relabel negative-dose units as first_treat == 0,” which contradicts the corrected guide/profile/registry position. The behavior under test is fine; the comment is not. Concrete fix: rewrite the docstring to say coercion is observed implementation behavior for inconsistent inputs, not a sanctioned routing option.
Residual risk: static review only; I could not run tests/test_profile_panel.py, tests/test_guides.py, or tests/test_continuous_did.py because the environment is missing numpy/pytest.

Path to Approval

Remove/rewrite the “derive first_treat from the same dose column” workflow in diff_diff/guides/llms-autonomous.txt, diff_diff/profile.py, CHANGELOG.md, and ROADMAP.md.
Rewrite §5.2 and the new tests/test_profile_panel.py regression prose around the actual ContinuousDiD contract: time-invariant unit dose D_i, separate first_treat/G_i.
If predictive pre-fit screening is still desired, add a helper or optional first_treat= input that validates dose and first_treat together; otherwise explicitly narrow the docs to descriptive dose-support checks only.

… first_treat from dose" framing; add PanelProfile backward-compat defaults; fix test_continuous_did docstring P1 (canonical ContinuousDiD setup vs. derive-from-dose framing): Round 12 introduced a "standard workflow" description across the guide, profile docstring, CHANGELOG, ROADMAP, and test docstrings that said agents derive `first_treat` from the same dose column passed to `profile_panel`. Reviewer correctly noted this conflicts with the actual ContinuousDiD contract (`continuous_did.py:222-228`, `prep_dgp.py:970-993`, `docs/methodology/continuous-did.md:65-73`): the canonical setup uses a **time-invariant per-unit dose** `D_i` and a **separate `first_treat` column** the caller supplies — the dose column has no within-unit time variation in this setup, so it cannot encode timing. An agent following the rejected framing would either build a `0,0,d,d` path (which `fit()` rejects) or keep a valid constant-dose panel (in which case the dose column carries no timing information). Reworded every site to drop the derive-from-dose framing and replace with the canonical setup. The five facts on the dose column remain predictive of `fit()` outcomes BECAUSE the canonical convention ties `first_treat == 0` to `D_i == 0` and treated units carry their constant dose across all periods — so `has_never_treated` proxies `P(D=0) > 0` and `dose_min > 0` predicts the strictly- positive-treated-dose requirement, without any "derivation" of `first_treat` from the dose column. Updated: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (rewrote the multi-paragraph block to use the canonical-setup framing and added an explicit "agent must validate `first_treat` independently" note). - `diff_diff/guides/llms-autonomous.txt` §2 field reference. - `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet. - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain step 2 + counter-examples #4 and #5 (now describe the canonical setup rather than a derive-from-dose workflow). - `CHANGELOG.md` Wave 2 entry. - `ROADMAP.md` AI-Agent Track building block. - `tests/test_profile_panel.py` `test_treatment_dose_min_flags _negative_dose_continuous_panels` docstring/comments. P2 (PanelProfile direct-construction backward compat): Wave 2 added `outcome_shape` and `treatment_dose` to PanelProfile without defaults, breaking direct `PanelProfile(...)` calls that predate Wave 2. Made both fields default to `None` (moved them to the end of the field list; both are `Optional[...]`). Added `test_panel_profile_direct_construction_without_wave2_fields` asserting that direct construction without the new fields succeeds and yields `None` defaults that serialize correctly through `to_dict()`. P3 (test_continuous_did.py docstring overstating sanction): The new `test_negative_dose_on_never_treated_coerces_not_rejects` docstring said the contract "lets agents legally relabel negative-dose units as `first_treat == 0` to coerce them away." Reworded as observed implementation behavior for inconsistent inputs, NOT a sanctioned routing option — the test locks in the coercion contract while the autonomous guide §5.2 explicitly tells agents not to use this path methodologically. Length invariant maintained: autonomous (65748 chars) < full (66031 chars); `test_full_is_largest` still passes (compares character count, not byte count, so on-disk size with UTF-8 multi-byte characters differs from the assertion target). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T11:24:10Z

/ai-review

github-actions · 2026-04-25T11:29:05Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 60014f5c25cc54e67380a1d0c59cf93b62dc0f72

Overall Assessment

✅ Looks good

Executive Summary

Re-review: the prior P1 about telling agents to derive first_treat from the dose column is resolved on the public-facing surfaces. The updated ContinuousDiD guidance now matches the canonical contract: time-invariant unit dose D_i, separate caller-supplied first_treat, and separate fit-time validation of first_treat (diff_diff/profile.py:L65-L141, diff_diff/guides/llms-autonomous.txt:L206-L283, diff_diff/guides/llms-autonomous.txt:L810-L919, diff_diff/continuous_did.py:L222-L360).
The new OutcomeShape.is_count_like guidance is consistent with the actual Poisson path: WooldridgeDiD(method="poisson") rejects negative outcomes, and the guide now correctly describes the reported overall_att as an ASF-based outcome-scale difference rather than a multiplicative reported estimand (diff_diff/profile.py:L700-L725, diff_diff/wooldridge.py:L1102-L1245, docs/methodology/REGISTRY.md:L1333-L1349).
The public API propagation looks complete for this PR: the new dataclasses are exported from diff_diff, and PanelProfile keeps backward compatibility by defaulting the new fields (diff_diff/__init__.py:L252-L258, diff_diff/__init__.py:L503-L509, diff_diff/profile.py:L189-L193, tests/test_profile_panel.py:L1325-L1362).
P3: two new regression docstrings still restate the old “derive first_treat from dose” workflow. This is test-only prose, not a runtime or methodology bug, but it leaves stale language in the tree (tests/test_profile_panel.py:L1110-L1137, tests/test_profile_panel.py:L1184-L1199).
Static review only: I could not run the touched tests in this environment because pytest is unavailable (python -m pytest ... failed with No module named pytest).

Methodology

No unmitigated findings. The affected methods are ContinuousDiD and WooldridgeDiD, but only through pre-fit descriptive guidance, not estimator math or inference. The updated text aligns with the ContinuousDiD contract in docs/methodology/continuous-did.md:L65-L76, docs/methodology/continuous-did.md:L265-L269, docs/methodology/REGISTRY.md:L717-L745, and diff_diff/continuous_did.py:L222-L360, and with the Poisson ASF path documented for WooldridgeDiD in docs/methodology/REGISTRY.md:L1333-L1349 and implemented at diff_diff/wooldridge.py:L1102-L1245.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: tests/test_profile_panel.py:L1110-L1137 and tests/test_profile_panel.py:L1184-L1199 still describe a “standard ContinuousDiD workflow” where first_treat is derived from the same dose column. That contradicts the now-corrected contract in diff_diff/profile.py:L65-L141 and diff_diff/guides/llms-autonomous.txt:L206-L283, diff_diff/guides/llms-autonomous.txt:L810-L919. It will not change runtime behavior, but it can mislead future editors and make this methodology regression easier to reintroduce.
Concrete fix: rewrite those test docstrings/comments to the canonical formulation already used elsewhere: time-invariant per-unit dose D_i, separate caller-supplied first_treat, and profile-side screening only on the dose column.
Residual risk: static review only; targeted pytest execution was not possible in this environment because pytest is not installed.

…derive first_treat from dose" wording from two test docstrings Round 13 dropped the "standard workflow / derive first_treat from dose" framing across the public-facing surfaces (profile.py docstring, autonomous guide §2/§4.7/§5.2, CHANGELOG, ROADMAP) but left two test docstrings carrying the rejected framing: - `test_treatment_dose_descriptive_fields_supplement_existing_gates` - `test_treatment_dose_min_flags_negative_dose_continuous_panels` These are test-only prose, not runtime or methodology bugs, but the stale language could mislead future editors and make the same methodology regression easier to reintroduce. Reworded both docstrings to use the canonical formulation now used elsewhere: time-invariant per-unit dose `D_i`, separate caller-supplied `first_treat`, profile-side screening on the dose column only, and explicit reference to "with a `first_treat` column consistent with the dose column on per-unit treated/untreated status" rather than "agent derives `first_treat` from the dose column". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T11:46:52Z

/ai-review

github-actions · 2026-04-25T11:51:30Z

🔁 AI review rerun (requested by @igerber)

Head SHA: aea7b1d11d1c852e30fa6b5e2e687d68edd1b9a9

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior re-review concern about stale ContinuousDiD prose appears resolved: the touched guide/tests now consistently describe the canonical setup as time-invariant unit dose D_i plus a separate caller-supplied first_treat column.
Severity P1: the new ContinuousDiD guidance repeatedly treats treatment “re-encoding” as a documented or registry-backed fix when P(D=0)=0 or treated doses are negative, but the methodology docs only document that the current implementation requires P(D=0)>0 and that Remark 3.1’s lowest-dose-as-control path is not implemented.
No estimator math or inference code changed in this PR, and I did not find new NaN/inference anti-patterns or parameter-propagation gaps in the touched runtime code.
The new OutcomeShape / TreatmentDoseShape API surface is otherwise propagated cleanly: top-level exports, backward-compatible PanelProfile defaults, and to_dict() serialization all line up.
Static review only: I could not run the touched tests here because the available python3 environment lacks numpy and pytest.

Methodology

Severity: P1
Impact: ContinuousDiD’s documented contract is still “requires P(D=0)>0,” and the paper’s no-untreated-group fallback is the lowest-dose-as-control remark, which the registry explicitly marks as not implemented (docs/methodology/REGISTRY.md:L719-L745, docs/methodology/continuous-did.md:L183-L184, docs/methodology/continuous-did.md:L265-L269). The PR’s new agent-facing text instead says the “documented fixes” / “registry-documented options” are to re-encode the treatment scale into a true never-treated group or non-negative support (diff_diff/profile.py:L82-L96, diff_diff/profile.py:L113-L121, diff_diff/guides/llms-autonomous.txt:L244-L247, diff_diff/guides/llms-autonomous.txt:L562-L565, diff_diff/guides/llms-autonomous.txt:L593-L595, diff_diff/guides/llms-autonomous.txt:L893-L911). That is a new, undocumented methodological recommendation on the library’s agent surface, and the new tests codify it (tests/test_profile_panel.py:L1138-L1142, tests/test_profile_panel.py:L1200-L1202).
Concrete fix: Reword these passages to the supported contract: current ContinuousDiD requires untreated D=0 units and positive treated doses; lowest-dose-as-control exists in the paper but is not implemented here. If the project intentionally wants to endorse recoding as acceptable preprocessing, add an explicit Note:/deviation entry to docs/methodology/REGISTRY.md that defines when that recoding is valid and how it changes the estimand, then update the guide/docstrings/tests to cite that new note.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not find an existing TODO.md entry that mitigates the methodology issue above.

Security

No findings.

Documentation/Tests

No additional findings beyond the Methodology item. Coverage on the new profile fields and guide wording is otherwise strong, and the earlier stale test-prose issue looks resolved, but I could not execute the suite in this environment because python3 lacks numpy and pytest.

Path to Approval

Remove the unsupported “documented fix” / “registry-documented fix” wording for ContinuousDiD recoding across the touched surfaces (diff_diff/profile.py, diff_diff/guides/llms-autonomous.txt, CHANGELOG.md, ROADMAP.md, and the new assertions/comments in tests/test_profile_panel.py) so they match the current registry.
If recoding guidance is intentional, document it first in docs/methodology/REGISTRY.md with the exact admissible cases and estimand consequences, then update the guide/docstrings/tests to reference that note instead of implying the support already exists.

…fixes" overclaim for ContinuousDiD recoding P1 (overclaiming registry endorsement of recoding): Reviewer correctly noted the round-13/14 wording across the public-facing surfaces called re-encoding the treatment column a "registry-documented fix" / "documented option" / "documented fallback". REGISTRY only documents the `P(D=0) > 0` requirement and explicitly notes Remark 3.1's lowest-dose-as-control fallback is NOT implemented in this library. Re-encoding is an agent-side preprocessing choice that the registry neither endorses nor forbids — calling it "registry-documented" was an over-claim. Reworded twelve sites to drop the "documented" framing: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (items (1) and (5)). - `diff_diff/guides/llms-autonomous.txt` §2 field reference When-(1)-or-(5)-fails paragraph. - `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet trailing language. - `diff_diff/guides/llms-autonomous.txt` §4.7 trailing paragraph (consolidated to a pointer at §2; reduced redundancy). - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain counter-example #4. - `tests/test_profile_panel.py` two test docstrings + one inline assertion message + one trailing comment. - `CHANGELOG.md` Wave 2 entry. - `ROADMAP.md` AI-Agent Track building block. The corrected framing across all surfaces: - Honestly state the contract: `ContinuousDiD` requires `P(D=0) > 0` and positive treated doses; Remark 3.1 not implemented. - When the contract isn't met, say `ContinuousDiD` "as currently implemented does not apply" — not "do this fix." - Mention routing alternatives that ARE in the library and DON'T require `P(D=0) > 0`: `HeterogeneousAdoptionDiD`, linear DiD with a continuous covariate. Those are routing facts, not methodology endorsements. - Re-encoding stays in the prose as an "agent-side preprocessing choice that changes the estimand and is not documented in REGISTRY as a supported fallback" — explicitly NOT endorsed. Length housekeeping: trimmed redundancy in the §4.7 trailing paragraph (consolidated to a pointer at §2) and tightened the §2 recoding paragraph; autonomous (65984 chars) < full (66031), `test_full_is_largest` green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T12:16:18Z

/ai-review

github-actions · 2026-04-25T12:22:22Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 31a843a37a6c229445d003d694b26a0acebd9233

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior re-review P1 about ContinuousDiD recoding language looks resolved: the new guide/docstrings now say re-encoding changes the estimand and is not a registry-backed fallback.
Severity P1 [Newly identified]: the new ContinuousDiD preflight guidance leans on PanelProfile.has_never_treated, but its implementation does not match its documented contract for continuous panels.
diff_diff/profile.py:L356-L360 defines has_never_treated as “some unit has treatment == 0 in every observed non-NaN row,” while diff_diff/profile.py:L533-L540 sets it from groupby(unit).max() == 0 only.
That can silently false-positive on mixed negative/zero dose paths, and the new autonomous guide now uses that field as the proxy for P(D=0) > 0 in diff_diff/guides/llms-autonomous.txt:L115-L121 and diff_diff/guides/llms-autonomous.txt:L228-L245.
I did not find new estimator-math, variance/SE, NaN-inference, or parameter-propagation issues in the touched runtime code.
Static review only: I could not execute the touched tests locally because the available Python environment lacks pandas.

Methodology

Severity: P1 [Newly identified]. Impact: the PR’s new ContinuousDiD routing surface assumes has_never_treated means “zero in every observed row,” but the implementation returns True whenever any unit’s maximum observed dose is 0. On a continuous panel with a unit path like [-1, 0], the flag is wrong even though no unit is actually a zero-dose control. Because the new agent-facing docs treat this field as the P(D=0) > 0 proxy, the PR can now misstate control availability before fit. Relevant locations: diff_diff/profile.py:L356-L360, diff_diff/profile.py:L533-L540, diff_diff/guides/llms-autonomous.txt:L115-L121, diff_diff/guides/llms-autonomous.txt:L228-L245, tests/test_profile_panel.py:L461-L477. Concrete fix: compute the flag from “all observed non-NaN doses for a unit equal 0” (for example, require both per-unit min and max to be 0, or an explicit all-zero check), then add a regression covering a negative/zero mixed path and asserting has_never_treated is False.

Code Quality

No findings beyond the Methodology item.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No separate tech-debt findings. TODO.md does not currently track the P1 above, so it remains unmitigated.

Security

No findings.

Documentation/Tests

No additional findings beyond the Methodology item. The new guide/tests are otherwise materially better aligned with the registry than in the previous review, especially on removing the unsupported “documented fix” framing for ContinuousDiD recoding.

Path to Approval

Fix diff_diff/profile.py so continuous has_never_treated only counts units whose full observed dose path is zero.
Add a regression in tests/test_profile_panel.py for a continuous panel containing a mixed negative/zero unit path (for example [-1, 0]) asserting that it does not count as never-treated.
Recheck the new autonomous-guide wording after that fix lands, since the PR now treats has_never_treated as the zero-dose-control proxy for ContinuousDiD.

…ementation to match documented contract on continuous panels with negative doses Reviewer correctly identified an implementation bug between the documented contract and the current code: - Documented (line 356-360): "some unit has treatment == 0 in every observed non-NaN row" - Implemented (line 540): `bool(np.any(unit_max == 0))` These are equivalent on binary panels (values in {0, 1}) but DIVERGE on continuous panels with negative dose support. A unit path `[-1, 0]` has `unit_max == 0` but is NOT a zero-dose never-treated control — some rows are negative. The autonomous guide's ContinuousDiD preflight now treats `has_never_treated` as the proxy for `P(D=0) > 0`, so a false- positive on a `[-1, 0]` path would silently mislead an agent about control availability. Reviewer flagged this as P1. Fixed the implementation to enforce both endpoints: `bool(np.any((unit_max == 0) & (unit_min == 0)))`. For any numeric panel this is exactly "every observed dose is 0" — matches the docstring. Updated the inline comment to explain why both endpoints are required. Added `test_continuous_negative_then_zero_does_not_count_as_never_treated` covering the exact bug shape: a unit path of `[-1, -1, 0, 0]` has `unit_max == 0` but is not a never-treated control. Test asserts `has_never_treated is False` on the panel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T12:27:22Z

/ai-review

github-actions · 2026-04-25T12:32:34Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e712742f3faa11ceefdff91389a4ec736513f2e3

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior re-review P1 on has_never_treated semantics appears resolved: _classify_treatment() now requires both per-unit min and max to be 0, and a regression covers the [-1, 0] path case in diff_diff/profile.py:L533-L548 and tests/test_profile_panel.py:L480-L509.
I did not find new estimator-math, variance/SE, NaN-inference, or parameter-propagation defects in the touched runtime code.
P1: the new ContinuousDiD fallback guidance is too broad on negative-dose failures. The PR now presents HeterogeneousAdoptionDiD as an alternative when treatment_dose.dose_min <= 0, but HAD’s documented scope and validator require nonnegative doses.
That mismatch is duplicated across public-facing surfaces and even a new test docstring, so the PR is currently baking incorrect methodology guidance into the agent path.
Static review only: I could not run the touched tests here because pytest is not installed (python -m pytest -> No module named pytest).

Methodology

Severity: P1. Impact: affected methods are ContinuousDiD and HeterogeneousAdoptionDiD. The new profile/guide prose says that when ContinuousDiD fails because dose_min <= 0, a routing alternative is HeterogeneousAdoptionDiD. That is not consistent with the Methodology Registry or HAD’s front-door validator: HAD is defined on nonnegative dose support (D_{g,2} >= 0) and explicitly raises on negative post-period dose. A user or agent following the new guidance can be steered into an unsupported estimator path. Relevant locations: docs/methodology/REGISTRY.md:L2257-L2263, diff_diff/had.py:L1450-L1458, diff_diff/profile.py:L119-L129, diff_diff/guides/llms-autonomous.txt:L244-L249, diff_diff/guides/llms-autonomous.txt:L565-L570, tests/test_profile_panel.py:L1240-L1245, tests/test_profile_panel.py:L1279-L1281, plus the duplicated release-note surfaces in CHANGELOG.md:L13 and ROADMAP.md:L141. Concrete fix: split the failure modes. Keep HAD only for nonnegative graded-adoption panels that satisfy HAD’s own design contract. For dose_min < 0, say ContinuousDiD does not apply; either re-encode to nonnegative support with the estimand change explicitly documented, or use a method that treats dose as a signed continuous covariate. Update the duplicated prose and add a regression that forbids presenting HAD as a direct fallback in the negative-dose example.

Code Quality

No additional findings beyond the Methodology item.

Performance

No findings.

Maintainability

No additional findings beyond the Methodology item. The main maintainability risk is the same incorrect fallback text being duplicated across multiple files.

Tech Debt

No separate tech-debt findings. The P1 above is not tracked in TODO.md, so it remains unmitigated.

Security

No findings.

Documentation/Tests

No separate findings beyond the Methodology item. The new guide/tests materially improve alignment overall, but tests/test_profile_panel.py:L1240-L1245 and L1279-L1281 currently encode the same unsupported HAD fallback for negative-dose panels.
Static-review limitation: I could not execute the touched tests in this environment because pytest is unavailable.

Path to Approval

Update diff_diff/profile.py and diff_diff/guides/llms-autonomous.txt so HAD is no longer described as a direct fallback for dose_min < 0 / negative treated-dose panels.
Update the duplicated prose in CHANGELOG.md, ROADMAP.md, and the negative-dose comments/docstring in tests/test_profile_panel.py to match the HAD nonnegative-dose contract in REGISTRY.md.
Add a guide regression in tests/test_guides.py asserting that the negative-dose ContinuousDiD counterexample does not route directly to HeterogeneousAdoptionDiD unless the treatment has first been re-encoded onto nonnegative support and the estimand change is called out.

…s "negative dose" branches; HAD only valid on the former Reviewer correctly noted that the round-15/16 wording listed `HeterogeneousAdoptionDiD` as a routing alternative whenever `ContinuousDiD` fails on the dose-related preflights, but HAD itself requires non-negative dose support and raises on negative post-period dose at `had.py:1450-1459` (paper Section 2). On a panel with `dose_min < 0`, routing to HAD silently steers an agent into the same fit-time error. Verified the rejection at `had.py:1450-1459`. Reworded every site to split the two failure modes: - Branch (a): `has_never_treated == False` (no zero-dose controls but all observed doses non-negative). `ContinuousDiD` does not apply (Remark 3.1 not implemented). HAD IS a routing alternative on this branch (HAD's contract requires non-negative dose, satisfied here); linear DiD with a continuous covariate is another. - Branch (e): `dose_min < 0` (negative treated doses). `ContinuousDiD` does not apply AND HAD is **not** a fallback either — HAD raises on negative post-period dose (`had.py:1450-1459`). Linear DiD with a signed continuous covariate is the applicable alternative on this branch. Updated wording across: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (refactored from item-by-item duplication into a numbered list with a single "Routing alternatives when (1) or (5) fails" section that splits the two branches; trimmed redundancy). - `diff_diff/guides/llms-autonomous.txt` §2 field reference (split the When-(1)-or-(5)-fails paragraph into the two branches). - `diff_diff/guides/llms-autonomous.txt` §4.7 trailing paragraph (consolidated to a pointer at §2's split discussion). - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain counter-example #4 (no never-treated branch: HAD applies) and counter-example #5 (negative-dose branch: HAD does NOT apply, cite `had.py:1450-1459`). - `CHANGELOG.md` Wave 2 entry. - `ROADMAP.md` AI-Agent Track building block. - `tests/test_profile_panel.py` two test docstrings/comments. Added `test_autonomous_negative_dose_path_does_not_route_to_had` in `tests/test_guides.py` asserting that §5.2 explicitly cites `had.py:1450-1459` on the negative-dose branch (used a single- line fingerprint since the prose phrase "non-negative dose support" is split across newlines in the rendered guide). Length housekeeping: trimmed counter-example #4 and #5 prose + §4.7 trailing paragraph to point at §2's split discussion; autonomous (65374 chars) < full (66031), `test_full_is_largest` green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T12:40:03Z

/ai-review

The `llms-full.txt > llms-autonomous.txt` length assertion was a stale Wave-1 invariant. Original assumption: "full" is the comprehensive reference and therefore the longest by definition. Reality after Wave 1 + Wave 2 + 17 review rounds: the autonomous reference guide has grown organically (estimator-support matrix, per-design-feature reasoning, three worked examples, contract citations like `had.py:1450-1459`) while `llms-full.txt` is the API-docs roll-up and hasn't been updated in lockstep with Wave 2's agent-facing content. The two guides serve different audiences — there is no user-facing invariant requiring one to be larger than the other. The invariant was forcing artificial trims: each review round that added a contract caveat to autonomous required compensating prose deletion elsewhere just to keep autonomous below full. Each trim made autonomous marginally less complete without methodological benefit. Removed the autonomous comparison; kept `full > concise` and `full > practitioner` since `concise` is by definition shorter and `practitioner` is workflow prose without the API roll-up. Added a test docstring explaining the rationale so future editors don't reintroduce the comparison. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-25T12:46:27Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 872abc8bc04f3e326d2269898003f32d1d70c72b

Overall Assessment

✅ Looks good

Executive Summary

The prior re-review P1 appears resolved: the negative-dose branch now consistently says HeterogeneousAdoptionDiD is not a fallback, across diff_diff/profile.py:L114-L138, diff_diff/guides/llms-autonomous.txt:L244-L267 and L901-L910, CHANGELOG.md:L13-L13, ROADMAP.md:L141-L141, with regressions in tests/test_guides.py:L166-L197 and tests/test_profile_panel.py:L1229-L1294.
The runtime has_never_treated fix now matches the documented contract for continuous panels by requiring a candidate never-treated unit to be exactly zero in every observed row, avoiding the [-1, 0] false positive in diff_diff/profile.py:L535-L550; this is covered by tests/test_profile_panel.py:L480-L509.
I did not find new estimator-math, identification, weighting, variance/SE, NaN-inference, or parameter-propagation defects in the touched runtime code.
The new count-outcome guidance is aligned with Wooldridge Poisson’s non-negative-outcome requirement and ASF outcome-scale ATT reporting in diff_diff/profile.py:L721-L745, diff_diff/guides/llms-autonomous.txt:L188-L201 and L714-L737, consistent with docs/methodology/REGISTRY.md:L1335-L1349 and diff_diff/wooldridge.py:L1102-L1109.
Static review only: I could not run the touched tests here because pytest is not installed, and import-time smoke checks were also blocked because numpy is unavailable in this environment.

Methodology

Severity: P3 (informational, resolved prior finding). Impact: the previously reported negative-dose routing mismatch is fixed; the changed surfaces now correctly distinguish the “no never-treated, nonnegative dose” branch from the “negative-dose” branch, where HAD is inapplicable. Concrete fix: none. References: diff_diff/profile.py:L114-L138, diff_diff/guides/llms-autonomous.txt:L244-L267 and L901-L910, docs/methodology/REGISTRY.md:L2257-L2263 and L2504-L2506, diff_diff/had.py:L1446-L1459, diff_diff/continuous_did.py:L287-L294.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new untracked tech-debt findings.

Security

No findings.

Documentation/Tests

No findings in the touched docs/tests. The new guide/profile regressions cover the prior methodology issue and the continuous has_never_treated edge case in tests/test_guides.py:L37-L219 and tests/test_profile_panel.py:L480-L509 plus L919-L1294. Static limitation: I could not execute those tests in this environment because pytest is unavailable.

…cross all canonical surfaces P2 - HeterogeneousAdoptionDiD shipped (Phases 1a-3 + 3-followup + 4.5 A/B/C0 merged via PRs #366/#367) but missing from the new authoritative documentation surfaces this PR establishes - README.md ## Estimators catalog: add one-line entry after ContinuousDiD (de Chaisemartin, Ciccia, D'Haultfoeuille & Knau 2026; alias HAD). - diff_diff/guides/llms.txt ## Estimators: add matching one-liner. - .claude/commands/docs-check.md required-estimators table: add HAD row pointing at had.rst as the API target. - docs/references.rst: add new "Heterogeneous Adoption (No-Untreated Designs)" sub-section with the de Chaisemartin et al. (2026) arXiv:2405.04465v6 citation. - docs/api/index.rst: add HeterogeneousAdoptionDiD to estimators autosummary; add HeterogeneousAdoptionDiDResults + HeterogeneousAdoptionDiDEventStudyResults to results autosummary. - docs/api/had.rst (NEW): autoclass page for the three classes with a brief intro and a "When to use HAD" note pointing at sibling estimators (ContinuousDiD for never-treated controls, dCDH for binary reversible). - docs/doc-deps.yaml: add diff_diff/had.py + diff_diff/had_pretests.py source mappings (REGISTRY methodology, had.rst api_reference, README catalog, references.rst, llms.txt). The llms-full.txt mapping is intentionally omitted with a comment - that section is deferred to the Phase 5 follow-up tracked in TODO.md. TODO.md: narrow the Phase 5 entry from "llms.txt updates" to "llms-full.txt HeterogeneousAdoptionDiD section" since the catalog one-liner, API page, and bibliography landed here. practitioner_next_steps() integration and tutorial notebook remain deferred. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber force-pushed the agent-profile-shape-extensions branch from b444300 to ed57c7c Compare April 25, 2026 00:41

igerber and others added 3 commits April 25, 2026 06:45

igerber force-pushed the agent-profile-shape-extensions branch from 31e8875 to 25a1197 Compare April 25, 2026 10:53

igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026

igerber merged commit 47d3e02 into main Apr 25, 2026
23 of 24 checks passed

igerber deleted the agent-profile-shape-extensions branch April 25, 2026 14:33

Conversation

igerber commented Apr 24, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 24, 2026

Overall assessment

Executive summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Overall assessment

Executive summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!